European Union Language Resources in Sketch Engine

نویسندگان

  • Vít Baisa
  • Jan Michelfeit
  • Marek Medved
  • Milos Jakubícek
چکیده

Several parallel corpora built from European Union language resources are presented here. They were processed by state-of-the-art tools and made available for researchers in the Sketch Engine corpus management system. A completely new resource is introduced: EUR-Lex corpus, being one of the largest parallel corpus available at the moment, containing 840 million tokens of English and having the largest language pair (English-French) with more than 25 million aligned segments (paragraphs).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sketching the Dependency Relations of Words in Chinese

We proposes a language resource by automatically sketching grammatical relations of words based on dependency parses from untagged texts. The advantage of word sketch based on parsed corpora is, compared to Sketch Engine (Kilgarriff, Rychly, Smrz, & Tugwell, 2004), to provide more details about the different usage of each word such as various types of modification, which is also important in la...

متن کامل

Comparing Lexical Relationships Observed within Japanese Collocation Data and Japanese Word Association Norms

While large-scale corpora and various corpus query tools have long been recognized as essential language resources, the value of word association norms as language resources has been largely overlooked. This paper conducts some initial comparisons of the lexical relationships observed within Japanese collocation data extracted from a large corpus using the Japanese language version of the Sketc...

متن کامل

Coling 2008 22 nd International Conference on Computational Linguistics

While large-scale corpora and various corpus query tools have long been recognized as essential language resources, the value of word association norms as language resources has been largely overlooked. This paper conducts some initial comparisons of the lexical relationships observed within Japanese collocation data extracted from a large corpus using the Japanese language version of the Sketc...

متن کامل

Terminology finding in the Sketch Engine: an evaluation

The Sketch Engine is a leading corpus query tool, in use for lexicography at OUP, CUP, Collins and Le Robert, and at national language institutes of eight countries, and for teaching and research in many universities. Its distinctive feature is the ‘word sketch’ a one page, automatic, corpus, derived summary of a word’s grammatical and collocational behaviour. Very large corpora and word sketch...

متن کامل

Chinese Sketch Engine and the Extraction of Grammatical Collocations

This paper introduces a new technology for collocation extraction in Chinese. Sketch Engine (Kilgarriff et al., 2004) has proven to be a very effective tool for automatic description of lexical information, including collocation extraction, based on large-scale corpus. The original work of Sketch Engine was based on BNC. We extend Sketch Engine to Chinese based on Gigaword corpus from LDC. We d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016